Lexicons and grammars for language processing : industrial or handcrafted products ?

نویسنده

  • Éric Laporte
چکیده

During the recent years, the use of linguistic data for language processing (semantic ambiguity resolution, translation...) increased progressively. Such data are now commonly called language resources. A few years ago, nearly all the language resources used for this purpose were collections of texts as the Brown Corpus and the Penn Treebank, but the use of electronic lexicons (WordNet, FrameNet, VerbNet, ComLex...) and formal grammars (TAG...) developed recently. This development is slow because of most processes of construction of lexicons and grammars are manual, whereas the construction of corpora has always been highly automated.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

FreeLing 3.0: Towards Wider Multilinguality

FreeLing is an open-source multilingual language processing library providing a wide range of analyzers for several languages. It offers text processing and language annotation facilities to NLP application developers, lowering the cost of building those applications. FreeLing is customizable, extensible, and has a strong orientation to real-world applications in terms of speed and robustness. ...

متن کامل

Things between Lexicon and Grammar

A number of grammar formalisms were proposed in 80’s, such as Lexical Functional Grammars, Generalized Phrase Structure Grammars, and Tree Adjoining Grammars. Those formalisms then started to put a stress on lexicon, and were called as lexicalist (or lexicalized) grammars. Representative examples of lexicalist grammars were Head-driven Phrase Structure Grammars (HPSG) and Lexicalized Tree Adjoi...

متن کامل

MHSubLex: Using Metaheuristic Methods for Subjectivity Classification of Microblogs

In Web 2.0, people are free to share their experiences, views, and opinions. One of the problems that arises in web 2.0 is the sentiment analysis of texts produced by users in outlets such as Twitter. One of main the tasks of sentiment analysis is subjectivity classification. Our aim is to classify the subjectivity of Tweets. To this end, we create subjectivity lexicons in which the words into ...

متن کامل

Outilex, plate-forme logicielle de traitement de textes écrits

The Outilex software platform, which will be made available to research, development and industry, comprises software components implementing all the fundamental operations of written text processing : processing without lexicons, exploitation of lexicons and grammars, language resource management. All data are structured in XML formats, and also in more compact formats, either readable or bina...

متن کامل

Constraints in Computational

Research reported in this paper a) extends the familiar notions of constraints and preferences in computational semantic analysis and generation; b) adapts constraint satisfaction techniques to the requirements of natural language processing; and c) combines i) large-scale static knowledge sources (grammars, ontologies and lexicons) with ii) processing algorithms and iii) an advanced control ar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009